55 research outputs found
A Novel BiLevel Paradigm for Image-to-Image Translation
Image-to-image (I2I) translation is a pixel-level mapping that requires a
large number of paired training data and often suffers from the problems of
high diversity and strong category bias in image scenes. In order to tackle
these problems, we propose a novel BiLevel (BiL) learning paradigm that
alternates the learning of two models, respectively at an instance-specific
(IS) and a general-purpose (GP) level. In each scene, the IS model learns to
maintain the specific scene attributes. It is initialized by the GP model that
learns from all the scenes to obtain the generalizable translation knowledge.
This GP initialization gives the IS model an efficient starting point, thus
enabling its fast adaptation to the new scene with scarce training data. We
conduct extensive I2I translation experiments on human face and street view
datasets. Quantitative results validate that our approach can significantly
boost the performance of classical I2I translation models, such as PG2 and
Pix2Pix. Our visualization results show both higher image quality and more
appropriate instance-specific details, e.g., the translated image of a person
looks more like that person in terms of identity
GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images
In this work, we focus on synthesizing high-fidelity novel view images for
arbitrary human performers, given a set of sparse multi-view images. It is a
challenging task due to the large variation among articulated body poses and
heavy self-occlusions. To alleviate this, we introduce an effective
generalizable framework Generalizable Model-based Neural Radiance Fields
(GM-NeRF) to synthesize free-viewpoint images. Specifically, we propose a
geometry-guided attention mechanism to register the appearance code from
multi-view 2D images to a geometry proxy which can alleviate the misalignment
between inaccurate geometry prior and pixel space. On top of that, we further
conduct neural rendering and partial gradient backpropagation for efficient
perceptual supervision and improvement of the perceptual quality of synthesis.
To evaluate our method, we conduct experiments on synthesized datasets
THuman2.0 and Multi-garment, and real-world datasets Genebody and ZJUMocap. The
results demonstrate that our approach outperforms state-of-the-art methods in
terms of novel view synthesis and geometric reconstruction.Comment: Accepted at CVPR 202
Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing
In this paper, we define and study a new Cloth2Body problem which has a goal
of generating 3D human body meshes from a 2D clothing image. Unlike the
existing human mesh recovery problem, Cloth2Body needs to address new and
emerging challenges raised by the partial observation of the input and the high
diversity of the output. Indeed, there are three specific challenges. First,
how to locate and pose human bodies into the clothes. Second, how to
effectively estimate body shapes out of various clothing types. Finally, how to
generate diverse and plausible results from a 2D clothing image. To this end,
we propose an end-to-end framework that can accurately estimate 3D body mesh
parameterized by pose and shape from a 2D clothing image. Along this line, we
first utilize Kinematics-aware Pose Estimation to estimate body pose
parameters. 3D skeleton is employed as a proxy followed by an inverse
kinematics module to boost the estimation accuracy. We additionally design an
adaptive depth trick to align the re-projected 3D mesh better with 2D clothing
image by disentangling the effects of object size and camera extrinsic. Next,
we propose Physics-informed Shape Estimation to estimate body shape parameters.
3D shape parameters are predicted based on partial body measurements estimated
from RGB image, which not only improves pixel-wise human-cloth alignment, but
also enables flexible user editing. Finally, we design Evolution-based pose
generation method, a skeleton transplanting method inspired by genetic
algorithms to generate diverse reasonable poses during inference. As shown by
experimental results on both synthetic and real-world data, the proposed
framework achieves state-of-the-art performance and can effectively recover
natural and diverse 3D body meshes from 2D images that align well with
clothing.Comment: ICCV 2023 Poste
- …